24 research outputs found

    Architecture for performing secure computation on encrypted data

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 97-101).This thesis considers encrypted computation where the user specifies encrypted inputs to an untrusted batch program controlled by an untrusted server. In batch computation, all data that the program might need is known at program start time. Encrypted computation on untrusted batch programs can be realized through fully homomorphic encryption (FHE) techniques, but FHE's current overheads limit its applicability. Secure processors (e.g., Aegis), coprocessors (e.g., TPM) or hardware extensions (e.g., TXT) typically require trust in the entire processor, the host operating system and the program that computes on the inputs. In this thesis, we design a secure processor architecture, called Ascend, that guarantees privacy of data given untrusted batch programs. The key idea in Ascend to guarantee privacy is parameterizable, obfuscated program execution. From the perspective of the Ascend chip's input/output and power pins, an untrusted server cannot learn anything about private user data regardless of the program run. Ascend uses Oblivious RAM (ORAM) techniques to hide memory access patterns and differential-power analysis (DPA) resistance techniques to hide data-dependent power draw. For each of the input/output and power channels, an Ascend chip exposes a set of public knobs that fully specify the observable behavior of the chip given any batch program and any input to that batch program. These knobs (e.g., specifying strict intervals for when external memory should be accessed) are controlled by the server and can be tuned, based on the server's apriori knowledge of the program, to trade-off performance and power without impacting security. Experimental results when running Ascend on SPEC benchmarks show an average 3.6x /6.6x and 5.2x /4.7x performance/power overhead-when hiding memory access pattern and power draw-using two schemes that capture the server's apriori knowledge in different ways. Furthermore-when hiding memory access pattern only-performance/power overheads drop to only 2.6x/2.2x. These surprising results mean that it is viable to only trust hardware and not software in some security-conscious applications.by Christopher W. Fletcher.S.M

    Towards an interpreter for efficient encrypted computation

    Get PDF
    Fully homomorphic encryption (FHE) techniques are capable of performing encrypted computation on Boolean circuits, i.e., the user specifies encrypted inputs to the program, and the server computes on the encrypted inputs. Applying these techniques to general programs with recursive procedures and data-dependent loops has not been a focus of attention. In this paper, we take a first step toward building an interpreter that, given programs with complex control flow, schedules efficient code suitable for the application of FHE schemes. We first describe how programs written in a small Turing-complete instruction set can be executed with encrypted data and point out inefficiencies in this methodology. We then provide examples of scheduling (a) the greatest common divisor (GCD) problem using Euclid's algorithm and (b) the 3-Satisfiability (3SAT) problem using a recursive backtracking algorithm into path-levelized FHE computations. We describe how path levelization reduces control flow ambiguity and improves encrypted computation efficiency. Using these techniques and data-dependent loops as a starting point, we then build support for hierarchical programs made up of phases, where each phase corresponds to a fixed point computation that can be used to further improve the efficiency of encrypted computation. In our setting, the adversary learns an estimate of the number of steps required to complete the computation, which we show is the least amount of leakage possible

    Generalized external interaction with tamper-resistant hardware with bounded information leakage

    Get PDF
    This paper investigates secure ways to interact with tamper-resistant hardware leaking a strictly bounded amount of information. Architectural support for the interaction mechanisms is studied and performance implications are evaluated. The interaction mechanisms are built on top of a recently-proposed secure processor Ascend[ascend-stc12]. Ascend is chosen because unlike other tamper-resistant hardware systems, Ascend completely obfuscates pin traffic through the use of Oblivious RAM (ORAM) and periodic ORAM accesses. However, the original Ascend proposal, with the exception of main memory, can only communicate with the outside world at the beginning or end of program execution; no intermediate information transfer is allowed. Our system, Stream-Ascend, is an extension of Ascend that enables intermediate interaction with the outside world. Stream-Ascend significantly improves the generality and efficiency of Ascend in supporting many applications that fit into a streaming model, while maintaining the same security level.Simulation results show that with smart scheduling algorithms, the performance overhead of Stream-Ascend relative to an insecure and idealized baseline processor is only 24.5%, 0.7%, and 3.9% for a set of streaming benchmarks in a large dataset processing application. Stream-Ascend is able to achieve a very high security level with small overheads for a large class of applications.National Science Foundation (U.S.). Graduate Research Fellowship Program (Grant 1122374)American Society for Engineering Education. National Defense Science and Engineering Graduate FellowshipUnited States. Defense Advanced Research Projects Agency (Clean-slate design of Resilient, Adaptive, Secure Hosts Contract N66001-10-1-4089

    Freecursive ORAM: [Nearly] Free Recursion and Integrity Verification for Position-based Oblivious RAM

    Get PDF
    Oblivious RAM (ORAM) is a cryptographic primitive that hides memory access patterns as seen by untrusted storage. Recently, ORAM has been architected into secure processors. A big challenge for hardware ORAM schemes is how to efficiently manage the Position Map (PosMap), a central component in modern ORAM algorithms. Implemented naively, the PosMap causes ORAM to be fundamentally unscalable in terms of on-chip area. On the other hand, a technique called Recursive ORAM fixes the area problem yet significantly increases ORAM's performance overhead. To address this challenge, we propose three new mechanisms. We propose a new ORAM structure called the PosMap Lookaside Buffer (PLB) and PosMap compression techniques to reduce the performance overhead from Recursive ORAM empirically (the latter also improves the construction asymptotically). Through simulation, we show that these techniques reduce the memory bandwidth overhead needed to support recursion by 95%, reduce overall ORAM bandwidth by 37% and improve overall SPEC benchmark performance by 1.27x. We then show how our PosMap compression techniques further facilitate an extremely efficient integrity verification scheme for ORAM which we call PosMap MAC (PMMAC). For a practical parameterization, PMMAC reduces the amount of hashing needed for integrity checking by >= 68x relative to prior schemes and introduces only 7% performance overhead. We prototype our mechanisms in hardware and report area and clock frequency for a complete ORAM design post-synthesis and post-layout using an ASIC flow in a 32~nm commercial process. With 2 DRAM channels, the design post-layout runs at 1~GHz and has a total area of .47~mm2. Depending on PLB-specific parameters, the PLB accounts for 10% to 26% area. PMMAC costs 12% of total design area. Our work is the first to prototype Recursive ORAM or ORAM with any integrity scheme in hardware.Qatar Computing Research Institute (QCRI-CSAIL Parternship)National Science Foundation (U.S.)American Society for Engineering Education. National Defense Science and Engineering Graduate Fellowshi

    Suppressing the Oblivious RAM timing channel while making information leakage and program efficiency trade-offs

    Get PDF
    Oblivious RAM (ORAM) is an established cryptographic technique to hide a program's address pattern to an untrusted storage system. More recently, ORAM schemes have been proposed to replace conventional memory controllers in secure processor settings to protect against information leakage in external memory and the processor I/O bus. A serious problem in current secure processor ORAM proposals is that they don't obfuscate when ORAM accesses are made, or do so in a very conservative manner. Since secure processors make ORAM accesses on last-level cache misses, ORAM access timing strongly correlates to program access pattern (e.g., locality). This brings ORAM's purpose in secure processors into question. This paper makes two contributions. First, we show how a secure processor can bound ORAM timing channel leakage to a user-controllable leakage limit. The secure processor is allowed to dynamically optimize ORAM access rate for power/performance, subject to the constraint that the leakage limit is not violated. Second, we show how changing the leakage limit impacts program efficiency. We present a dynamic scheme that leaks at most 32 bits through the ORAM timing channel and introduces only 20% performance overhead and 12% power overhead relative to a baseline ORAM that has no timing channel protection. By reducing leakage to 16 bits, our scheme degrades in performance by 5% but gains in power efficiency by 3%. We show that a static (zero leakage) scheme imposes a 34% power overhead for equivalent performance (or a 30% performance overhead for equivalent power) relative to our dynamic scheme.United States. Dept. of Defense (National Defense Science and Engineering Graduate (NDSEG) Fellowship)United States. Defense Advanced Research Projects Agency. Clean-slate Design of Resilient, Adaptive, Secure Hosts (CRASH) Program (Contract N66001-10-2-4089

    Path ORAM: An Extremely Simple Oblivious RAM Protocol

    Get PDF
    We present Path ORAM, an extremely simple Oblivious RAM protocol with a small amount of client storage. Partly due to its simplicity, Path ORAM is the most practical ORAM scheme for small client storage known to date. We formally prove that Path ORAM requires log^2 N / log X bandwidth overhead for block size B = X log N. For block sizes bigger than Omega(log^2 N), Path ORAM is asymptotically better than the best known ORAM scheme with small client storage. Due to its practicality, Path ORAM has been adopted in the design of secure processors since its proposal.National Science Foundation (U.S.). Graduate Research Fellowship Program (Grant DGE-0946797)National Science Foundation (U.S.). Graduate Research Fellowship Program (Grant DGE-1122374)American Society for Engineering Education. National Defense Science and Engineering Graduate FellowshipNational Science Foundation (U.S.) (Grant CNS-1314857)United States. Defense Advanced Research Projects Agency (Clean-slate design of Resilient, Adaptive, Secure Hosts Grant N66001-10-2-4089

    Scalable, accurate multicore simulation in the 1000-core era

    Get PDF
    We present HORNET, a parallel, highly configurable, cycle-level multicore simulator based on an ingress-queued worm-hole router NoC architecture. The parallel simulation engine offers cycle-accurate as well as periodic synchronization; while preserving functional accuracy, this permits tradeoffs between perfect timing accuracy and high speed with very good accuracy. When run on 6 separate physical cores on a single die, speedups can exceed a factor of over 5, and when run on a two-die 12-core system with 2-way hyperthreading, speedups exceed 11 ×. Most hardware parameters are configurable, including memory hierarchy, interconnect geometry, bandwidth, crossbar dimensions, and parameters driving power and thermal effects. A highly parametrized table-based NoC design allows a variety of routing and virtual channel allocation algorithms out of the box, ranging from simple DOR routing to complex Valiant, ROMM, or PROM schemes, BSOR, and adaptive routing. HORNET can run in network-only mode using synthetic traffic or traces, directly emulate a MIPS-based multicore, or function as the memory subsystem for native applications executed under the Pin instrumentation tool. HORNET is freely available under the open-source MIT license at http://csg.csail.mit.edu/hornet/

    A Low-Latency, Low-Area Hardware Oblivious RAM Controller

    Get PDF
    We build and evaluate Tiny ORAM, an Oblivious RAM prototype on FPGA. Oblivious RAM is a cryptographic primitive that completely obfuscates an application’s data, access pattern, and read/write behavior to/from external memory (such as DRAM or disk). Tiny ORAM makes two main contributions. First, by removing an algorithmic bottleneck in prior work, Tiny ORAM is the first hardware ORAM design to support arbitrary block sizes (e.g., 64 Bytes to 4096 Bytes). With a 64 Byte block size, Tiny ORAM can finish an access in 1.4 µs, over 40X faster than the prior-art implementation. Second, through novel algorithmic and engineering-level optimizations, Tiny ORAM reduces the number of symmetric encryption operations by ~ 3X compared to a prior work. Tiny ORAM is also the first design to implement and report real numbers for the cost of symmetric encryption in hardware ORAM constructions. Putting it together, Tiny ORAM requires 18381 (5%) LUTs and 146 (13%) Block RAM on a Xilinx XC7VX485T FPGA, including the cost of encryptionQatar Computing Research Institute (QCRI-CSAIL Parternship)National Science Foundation (U.S.)American Society for Engineering Education. National Defense Science and Engineering Graduate Fellowshi

    Brief announcement: Distributed shared memory based on computation migration

    Get PDF
    Driven by increasingly unbalanced technology scaling and power dissipation limits, microprocessor designers have resorted to increasing the number of cores on a single chip, and pundits expect 1000-core designs to materialize in the next few years [1]. But how will memory architectures scale and how will these next-generation multicores be programmed? One barrier to scaling current memory architectures is the offchip memory bandwidth wall [1,2]: off-chip bandwidth grows with package pin density, which scales much more slowly than on-die transistor density [3]. To reduce reliance on external memories and keep data on-chip, today’s multicores integrate very large shared last-level caches on chip [4]; interconnects used with such shared caches, however, do not scale beyond relatively few cores, and the power requirements and access latencies of large caches exclude their use in chips on a 1000-core scale. For massive-scale multicores, then, we are left with relatively small per-core caches. Per-core caches on a 1000-core scale, in turn, raise the question of memory coherence. On the one hand, a shared memory abstraction is a practical necessity for general-purpose programming, and most programmers prefer a shared memory model [5]. On the other hand, ensuring coherence among private caches is an expensive proposition: bus-based and snoopy protocols don’t scale beyond relatively few cores, and directory sizes needed in cache-coherence protocols must equal a significant portion of the combined size of the per-core caches as otherwise directory evictions will limit performance [6]. Moreover, directory-based coherence protocols are notoriously difficult to implement and verify [7]

    Effects of antiplatelet therapy on stroke risk by brain imaging features of intracerebral haemorrhage and cerebral small vessel diseases: subgroup analyses of the RESTART randomised, open-label trial

    Get PDF
    Background Findings from the RESTART trial suggest that starting antiplatelet therapy might reduce the risk of recurrent symptomatic intracerebral haemorrhage compared with avoiding antiplatelet therapy. Brain imaging features of intracerebral haemorrhage and cerebral small vessel diseases (such as cerebral microbleeds) are associated with greater risks of recurrent intracerebral haemorrhage. We did subgroup analyses of the RESTART trial to explore whether these brain imaging features modify the effects of antiplatelet therapy
    corecore